AITopics | code review comment

Collaborating Authors

code review comment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning

Liu, Fang, Liu, Simiao, Zhu, Yinghao, Lian, Xiaoli, Zhang, Li

arXiv.org Artificial IntelligenceOct-31-2025

Identifying and addressing security issues during the early phase of the development lifecycle is critical for mitigating the long-term negative impacts on software systems. Code review serves as an effective practice that enables developers to check their teammates' code before integration into the codebase. To streamline the generation of review comments, various automated code review approaches have been proposed, where LLM-based methods have significantly advanced the capabilities of automated review generation. However, existing models primarily focus on general-purpose code review, their effectiveness in identifying and addressing security-related issues remains underexplored. Moreover, adapting existing code review approaches to target security issues faces substantial challenges, including data scarcity and inadequate evaluation metrics. To address these limitations, we propose SecureReviewer, a new approach designed for enhancing LLMs' ability to identify and resolve security-related issues during code review. Specifically, we first construct a dataset tailored for training and evaluating secure code review capabilities. Leveraging this dataset, we fine-tune LLMs to generate code review comments that can effectively identify security issues and provide fix suggestions with our proposed secure-aware fine-tuning strategy. To mitigate hallucination in LLMs and enhance the reliability of their outputs, we integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge. Additionally, we introduce SecureBLEU, a new evaluation metric designed to assess the effectiveness of review comments in addressing security issues. Experimental results demonstrate that SecureReviewer outperforms state-of-the-art baselines in both security issue detection accuracy and the overall quality and practical utility of generated review comments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.26457

Country:

North America > United States (0.28)
South America > Brazil > Rio de Janeiro (0.16)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification

Nguyen, Linh, Liu, Chunhua, Lin, Hong Yi, Thongtanunam, Patanamon

arXiv.org Artificial IntelligenceAug-14-2025

--Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review comments to gauge the effectiveness of code reviews. However, previous studies have primarily relied on supervised machine learning, which requires extensive manual annotation to train the models effectively. T o address this limitation, we explore the potential of using Large Language Models (LLMs) to classify code review comments. We assess the performance of LLMs to classify 17 categories of code review comments. Our results show that LLMs can classify code review comments, outperforming the state-of-the-art approach using a trained deep learning model. In particular, LLMs achieve better accuracy in classifying the five most useful categories, which the state-of-the-art approach struggles with due to low training examples. Rather than relying solely on a specific small training data distribution, our results show that LLMs provide balanced performance across high-and low-frequency categories. These results suggest that the LLMs could offer a scalable solution for code review analytics to improve the effectiveness of the code review process. Index T erms --code review, review comment classification, prompt engineering, large language models. Code Review (CR) is a practice in software development where developers review other developer's code changes asynchronously to find defects and suggest improvements [1]. Acting as a quality assurance gateway, CR has become mandatory in many prominent organizations, with developers reportedly spending 10-15% of their time on this task [2]. In practice, various types of concerns can be raised in CR comments, ranging from code styling to functional issues. As comments often trigger the improvements of code changes, the types of comments play a crucial role in the quality of CR. Constructive and actionable comments addressing quality-improving issues would positively contribute to CR's overall quality and code changes [3]-[5]. On the other hand, trivial or irrelevant comments can waste developers' time without improving the code changes [6].

category, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2508.09832

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI-Assisted Fixes to Code Review Comments at Scale

Maddila, Chandra, Ghorbani, Negar, Saindon, James, Thakkar, Parth, Murali, Vijayaraghavan, Abreu, Rui, Shen, Jingyue, Zhou, Brian, Nagappan, Nachiappan, Rigby, Peter C.

arXiv.org Artificial IntelligenceJul-21-2025

Aim. There are 10s of thousands of code review comments each week at Meta. We developed Metamate for Code Review (MetaMateCR) that provides AI-assisted fixes for reviewer comments in production at scale. Method. We developed an internal benchmark of 64k data points to fine-tune Llama models. Once our models achieve reasonable offline results, we roll them into production. To ensure that our AI-assisted fixes do not negatively impact the time it takes to do code reviews, we conduct randomized controlled safety trials as well as full production experiments. Offline Results. As a baseline, we compare GPT-4o to our small and large Llama models. In offline results, our LargeLSFT model creates an exact match patch 68% of the time outperforming GPT-4o by 9 percentage points (pp). The internal models also use more modern Hack functions when compared to the PHP functions suggested by GPT-4o. Safety Trial. When we roll MetaMateCR into production in a safety trial that compares no AI patches with AI patch suggestions, we see a large regression with reviewers taking over 5% longer to conduct reviews. After investigation, we modify the UX to only show authors the AI patches, and see no regressions in the time for reviews. Production. When we roll LargeLSFT into production, we see an ActionableToApplied rate of 19.7%, which is a 9.2pp improvement over GPT-4o. Our results illustrate the importance of safety trials in ensuring that AI does not inadvertently slow down engineers, and a successful review comment to AI patch product running at scale.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.13499

Country: North America > United States (0.46)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation

Lu, Junyi, Li, Xiaojia, Hua, Zihan, Yu, Lei, Cheng, Shiqi, Yang, Li, Zhang, Fengjun, Zuo, Chun

arXiv.org Artificial IntelligenceDec-24-2024

Code review is a vital but demanding aspect of software development, generating significant interest in automating review comments. Traditional evaluation methods for these comments, primarily based on text similarity, face two major challenges: inconsistent reliability of human-authored comments in open-source projects and the weak correlation of text similarity with objectives like enhancing code quality and detecting defects. This study empirically analyzes benchmark comments using a novel set of criteria informed by prior research and developer interviews. We then similarly revisit the evaluation of existing methodologies. Our evaluation framework, DeepCRCEval, integrates human evaluators and Large Language Models (LLMs) for a comprehensive reassessment of current techniques based on the criteria set. Besides, we also introduce an innovative and efficient baseline, LLM-Reviewer, leveraging the few-shot learning capabilities of LLMs for a target-oriented comparison. Our research highlights the limitations of text similarity metrics, finding that less than 10% of benchmark comments are high quality for automation. In contrast, DeepCRCEval effectively distinguishes between high and low-quality comments, proving to be a more reliable evaluation mechanism. Incorporating LLM evaluators into DeepCRCEval significantly boosts efficiency, reducing time and cost by 88.78% and 90.32%, respectively. Furthermore, LLM-Reviewer demonstrates significant potential of focusing task real targets in comment generation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.18291

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A GPT-based Code Review System for Programming Language Learning

Dong-Kyu, Lee

arXiv.org Artificial IntelligenceJun-21-2024

The increasing demand for programming language education and growing class sizes require immediate and personalized feedback. However, traditional code review methods have limitations in providing this level of feedback. As the capabilities of Large Language Models (LLMs) like GPT for generating accurate solutions and timely code reviews are verified, this research proposes a system that employs GPT-4 to offer learner-friendly code reviews and minimize the risk of AI-assist cheating. To provide learner-friendly code reviews, a dataset was collected from an online judge system, and this dataset was utilized to develop and enhance the system's prompts. In addition, to minimize AI-assist cheating, the system flow was designed to provide code reviews only for code submitted by a learner, and a feature that highlights code lines to fix was added. After the initial system was deployed on the web, software education experts conducted usability test. Based on the results, improvement strategies were developed to improve code review and code correctness check module, thereby enhancing the system. The improved system underwent evaluation by software education experts based on four criteria: strict code correctness checks, response time, lower API call costs, and the quality of code reviews. The results demonstrated a performance to accurately identify error types, shorten response times, lower API call costs, and maintain high-quality code reviews without major issues. Feedback from participants affirmed the tool's suitability for teaching programming to primary and secondary school students. Given these benefits, the system is anticipated to be a efficient learning tool in programming language learning for educational settings.

code review comment, learner, student, (13 more...)

arXiv.org Artificial Intelligence

2407.04722

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > Monaco (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Instructional Material (0.88)
Research Report (0.64)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting > K-12 Education > Secondary School (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Automating Patch Set Generation from Code Review Comments Using Large Language Models

Rahman, Tajmilur, Singh, Rahul, Sultan, Mir Yousuf

arXiv.org Artificial IntelligenceApr-9-2024

The advent of Large Language Models (LLMs) has revolutionized various domains of artificial intelligence, including the realm of software engineering. In this research, we evaluate the efficacy of pre-trained LLMs in replicating the tasks traditionally performed by developers in response to code review comments. We provide code contexts to five popular LLMs and obtain the suggested code-changes (patch sets) derived from real-world code-review comments. The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets from the same repositories. This comparative analysis aims to determine the accuracy, relevance, and depth of the LLMs' feedback, thereby evaluating their readiness to support developers in responding to code-review comments. Novelty: This particular research area is still immature requiring a substantial amount of studies yet to be done. No prior research has compared the performance of existing Large Language Models (LLMs) in code-review comments. This in-progress study assesses current LLMs in code review and paves the way for future advancements in automated code quality assurance, reducing context-switching overhead due to interruptions from code change requests.

code-review comment, developer, language model, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3644815.3644981

2406.04346

Country:

Europe > Portugal > Lisbon > Lisbon (0.06)
North America > United States > Pennsylvania > Erie County > Erie (0.05)
North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Automated Identification of Toxic Code Reviews Using ToxiCR

Sarker, Jaydeb, Turzo, Asif Kamal, Dong, Ming, Bosu, Amiangshu

arXiv.org Artificial IntelligenceFeb-7-2023

Toxic conversations during software development interactions may have serious repercussions on a Free and Open Source Software (FOSS) development project. For example, victims of toxic conversations may become afraid to express themselves, therefore get demotivated, and may eventually leave the project. Automated filtering of toxic conversations may help a FOSS community to maintain healthy interactions among its members. However, off-the-shelf toxicity detectors perform poorly on Software Engineering (SE) datasets, such as one curated from code review comments. To encounter this challenge, we present ToxiCR, a supervised learning-based toxicity identification tool for code review interactions. ToxiCR includes a choice to select one of the ten supervised learning algorithms, an option to select text vectorization techniques, eight preprocessing steps, and a large-scale labeled dataset of 19,571 code review comments. Two out of those eight preprocessing steps are SE domain specific. With our rigorous evaluation of the models with various combinations of preprocessing steps and vectorization techniques, we have identified the best combination for our dataset that boosts 95.8% accuracy and 88.9% F1 score. ToxiCR significantly outperforms existing toxicity detectors on our dataset. We have released our dataset, pre-trained models, evaluation results, and source code publicly available at: https://github.com/WSU-SEAL/ToxiCR

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

2202.13056

Country:

North America > United States > Hawaii (0.04)
North America > United States > Montana (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
(3 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback